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Abstract 

In this paper, we consider the problem of estimating a potentially sensitive (individually 
stigmatizing) statistic on a population. In our model, individuals are concerned about their 
privacy, and experience some cost as a function of their privacy loss. Nevertheless, they would 
be willing to participate in the survey if they were compensated for their privacy cost. These 
cost functions are not publicly known, however, nor do we make Bayesian assumptions about 
their form or distribution. Individuals are rational and will misreport their costs for privacy if 
doing so is in their best interest. Ghosh and Roth recently showed in this setting, when costs 
for privacy loss may be correlated with private types, if individuals value differential privacy, 
no individually rational direct revelation mechanism can compute any non-trivial estimate of 
the population statistic. In this paper, we circumvent this impossibility result by proposing a 
modified notion of how individuals experience cost as a function of their privacy loss, and by 
giving a mechanism which does not operate by direct revelation. Instead, our mechanism has the 
ability to randomly approach individuals from a population and offer them a take-it-or-leave-it 
offer. This is intended to model the abilities of a surveyor who may stand on a street corner 
and approach passers-by. 

1 Introduction 

Suppose you are a researcher and you would like to collect data from a population and perform an 
analysis on it. Presumably, you would like your sample, or at least your analysis, to be represen- 
tative of the underlying population. Unfortunately, individuals' decisions of whether to participate 
in your study may skew your data: perhaps people with an embarrassing medical condition are less 
likely to respond to a survey whose results might reveal their condition. 

One could try to incentivize participation by offering a reward for participation, but this only 
serves to skew the survey in favor of those who value the reward over the costs of participating (e.g., 
hassle, time, detrimental effects of what the study might reveal), which again may not result in a 
representative sample. Ideally, you would like to be able to find out exactly how much you would 
have to pay each individual to participate in your survey (her "value" , akin to a reservation price) , 
and offer her exactly that much. Unfortunately, traditional mechanisms for eliciting player values 
truthfully are not a good match for this setting because a player's value may be correlated with her 
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private information (for example, individuals with an embarrassing medical condition might want 
to be paid extra in order to reveal it). Standard mec hanisms based on the revelation principle are 
therefore no longer truthful. In fact, Ghosh and Roth [GRlll | showed that when participation costs 
can be arbitrarily correlated with private data, no direct revelation mechanism can simultaneously 
offer non-trivial accuracy guarantees and be individually rational for agents who value their privacy. 

Voluntarily provided data is a cornerstone of medical studies, opinion polls, human subjects 
research, and marketing studies. Some data collectors, such as the US Census, can get around the 
issue of voluntary participation by legal mandate, but this is rare. How might we still get analyses 
that represent the underlying population? 

Statisticians and econometricians have of course attempted to address selection and non-response 
bias issues. One approach is to assume that the e ffect of unobserved variables has mean zero. The 
Nobel-prize- winning Heckman correction method Hec79l | and the related literature instead attempt 
to correct for non-random samples by formulating a theory for the probabilities of the unobserved 
variables and using the theorized distribution to extrapolate a corrected sample. The limitations of 
these approaches is precisely in the assumptions they make on the structure of the data. Is it possi- 
ble to address these issues without needing to "correct" the observed sample, while simultaneously 
minimizing the cost of running the survey? 



1.1 Contributions 

The present paper provides a new model for incentivizing participation in data analyses when 
the subjects' value for their private information may be correlated with the sensitive information 
itself. In this model, we present a mechanism for eliciting responses that allows us to compute 
accurate statistical estimates, addressing the survey problem described above. We model costs for 
individual's participation using the tools and language of differential privacy; our mechanisms are 
not specific to user costs defined in terms of differential privacy, but offering guarantees of this type 
can significantly decrease costs when compared to mechanisms that ask for unrestricted access to 
user data. 

Of course a second issue beyond representative participation is truthful participation. We require 
that rational agents be positively incentivized to participate in our mechanism, but once we get their 
participation, there is the question of whether they will answer the survey question correctly. One 
solution is to assume that survey responses are verifiable or cannot easily be fabricated (e.g., the 
surveyor requires documentation of answers, or, more invasively, actually collects a blood sample 
from the participant). While the approach we present in this paper works well with such verifiable 
responses, in addition, our framework provides a formal "almost-truthfulness" guarantee, that the 
expected utility a participant could gain by lying in the survey is at most very small. Note that 
this is a different issue than participation, which is voluntary and always strictly incentivized. 



1.2 Differential Privacy 

Over the past decade, differential privacy has emerged as a compelling privacy definition, and has 
received considerable attention. While we provide formal definitions in Section[2l differential privacy 
essentially bounds the sensitivity of an algorithm's output to arbitrary changes in individual's data. 
In particular, it requires that the probability of any possible outcome of a computation be insensitive 
to the addition or removal of one person's data from the input. Among differential privacy 's many 
strengths are (1) that differentially private computations are approximately truthful MTOTI ] (which 
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gives the almost-truthfulness guarantee mentioned above), and (2) that differential privacy is a 
property of the mechanism and is independent of the input to the mechanism. 



How Individuals Should Value Privac y: Th e "Paradox" of Differential Privacy A 

natural approach taken by past work (e.g., GRll|) in attempting to model the cost incurred by 
participants in some computation on their private data is to model individuals as experiencing cost 
as a function of the differential privacy parameter e associated with the mechanism using their 
data. We argue here, however, that modeling an individual's cost for privacy loss solely as any 
function /(e) of the privacy parameter e would lead to unnatural agent behavior and incentives. 

Consider an individual who is approached on a street corner and offered a deal: she can par- 
ticipate in a survey in exchange for $100, or she can decline to participate and walk away. She 
is given the guarantee that both her participation decision and her input to the survey (if she 
opts to participate) will be treated in an e-differentially private manner. In the usual language of 
differential privacy, what does this mean? Formally, her input to the mechanism will be the tuple 
containing her participation decision and her private type. If she decides not to participate, the 
mechanism output is not allowed to depend on her private type, and switching her participation 
decision to "yes" cannot change the probability of any outcome by more than a small multiplicative 
factor. Similarly, fixing her participation decision as "yes" , any change in her stated type can only 
change the probability of any outcome by a small multiplicative factor. 

How should she respond to this offer? A natural conjecture is that she would experience a 
higher privacy cost for participating in the survey than not (after all, if she does not participate, 
her private type has no effect on the output of the mechanism - she need not even have provided 
it), and that she should weigh that privacy cost against the payment offered, and make her decision 
accordingly. 

However, if her privacy cost is solely some function /(e) of the privacy parameter of the mech- 
anism, she is actually incentivized to behave quite differently. Since the privacy parameter e is 
independent of her input, her cost /(e) will be identical whether she participates or not. Indeed, 
her participation decision does not affect her privacy cost, and only affects whether she receives 
payment or not, and so she will always opt to participate in exchange for any positive payment. 
Further, she experiences the full privacy cost of the mechanism simply by being asked whether she 
wishes to participate in the survey, even if her private data is never used! 

We view this as problematic and as not modeling the true decision-making process of individuals: 
in reality, the full privacy cost of a survey may not have been experienced by the individual before 
her data has been given. Furthermore, individuals are unlikely to accept arbitrarily low offers for 
their private data. One potential route to addressing this "paradox" would be to move away from 
modeling the value of pri vacy solel y in terms of an input-independent privacy guarantee. This 
is the approach taken by [CCKiUl]- Instead, we retain the framework of differential privacy, but 
introduce a new model for how individuals reason about the cost of privacy loss. Roughly, we model 
individuals' costs as a function of the differential privacy parameter of the portion of the mechanism 
they participate in, and assume they do not experience cost from the parts of the mechanism that 
process data that they have not provided (or that have no dependence on their data). For our 
application, we consider mechanisms that operate in two stages: first, they aggregate participation 
decisions by making take-it-or-leave-it offers, but do not compute on the private types collected 
from the participating individuals. The "output" of this stage of the mechanism is the observed 
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behavior of the surveyor: the number of people approached and the prices offered^J The second 
stage of the mechanism takes as input the reported private types of the individuals who elected to 
participate. In our model, individuals who declined to provide their private types do not experience 
any cost in this second stage of the mechanism. 



1.3 Related Work 



In recent years, differential privacy, which was introduced in a series of papers [DMNSOSl . iBDMNOa ] . 
has emerged as the standard solution concept for privacy in the theoretical computer science liter- 
ature. There is by now a very large literature on this fascinating topic, whi ch we d o not attempt 
to survey here, instead referring the interested reader to a survey by Dwork [Dwo08l | . 

McSherry and Tal war pro posed that differential privacy could itself be used as a solution concept 
in mechanism design [MT07l | . They observed that any differentially private mechanism is approxi- 
mately truthful, while simultaneously having some resilience to collusion. Using differential privacy 
as a solution concept (as opposed to dominant strategy truthfulness) they gave some improved 
results in a variety of auc tion settin gs. Gupta et al. also used differential privacy as a solution 
concept in auction design jGLM+ld ]. 

This literatur e was re cently extended by a series of elegant papers by Nissim, Smorodinsky, 
and Tennenh oltz [NST12| |. Xiao jXiall[ |. Ni ssim, Orlandi, and Smorodinsky jNOSlll |. and Chen et 
al. |CCK+lll |. This line of work observes f jNSTll Bcillll |^ that differential privacy does not lead 
to exactly truthful mechanisms, and indeed that manipulations might be easy to find, and then 
s eeks to design mechanism s that are exactly truthful even when agents explicitly value privacy 

f jxiaiil . InosiiI . Icck+ii| i. 

Feigenbaum, Jaggard, and Schapira considered (using a different notion of privacy) how the 
impl ementat ion of an auction can affect how many bits of information are leaked about individuals 



bids jFJSl(| . Specifically, they study to what extent information must be leaked in second price 



auctions and in the millionaires problem. We consider somewhat orthogonal notions of privacy and 
implementation that make our results incomparable. 



Most related to this paper is an orthogonal direction initiated by Ghosh and Roth GRllI ]. 
Ghosh and Roth consider the problem of a data analyst who wishes to buy data from a population 
for the purposes of computing an accurate estimate of some population statistic. Individuals 
experience cost as a function of their privacy loss (as measured by differential privac y), and must 



be incentivized by a truthful mechanism to report their true costs. In particular, jGRllI ] show 
that if individuals experience disutility as a function of differential privacy, and if costs for privacy 
can be arbitrarily correlated with private types, then no individually rational direct revelation 
mechanism can achieve any nontrivial accuracy. In this paper, we overcome this impossibility 
result by abandoning the direct revelation model in favor of a model in which a surveyor can 
approach random individuals from the population and offer them take-it-or-leave-it offers, and by 
introducing a slightly different model for how individuals experience cost as a function of privacy. 
We note that the conversat ion on how individuals should experience costs as a function of privacy is 
ongoing. Ghosh and Roth |GRll[ | sugg est tha t privacy costs should be a function of the differential 
privacy parameter of the mechanism; jXialll ] suggest that such costs should be a function of the 



mutual information between the agent type and the mechanism output (such a measure requires a 



^For clarity, in our analysis, we also include as part of the output of this stage a noisy (privacy-preserving) count 
of the number of people accepting the highest offer we make. 
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prior on player types); Nissim, Orlandi, and Smorodinsky suggest that agent cos ts shoul d merely be 



upper bounded by a linear function of the privacy parameter of the mechanism NOSllI ] ; and Chen 



et al. jCCK+lll l suggest that the appropriate measure should be outcome dependent (although 



inspired by differential privacy). The model in this paper adds t o this fruitful conversation. 



Concurrently with this work, Roth and Schoenebeck [RS12| | consider the problem of deriving 
Bayesian optimal survey mechanisms for computing minimum variance unbiased estimators of a 
population statistic from individuals who have costs for participating in the survey. Although the 
motivation of this work is similar, the results are orthogonal. In this paper, we take a prior-free 
approach and m odel costs for private access to data using the formalism of differential privacy. In 



contrast, RS12l | takes a Bayesian approach, assuming a known prior over agent costs, and does 



not attempt to provide any privacy guarantee, and instead only seeks to pay individuals for their 
participation. 



Also contemporaneously with this work, Fleischer and Lyu |FL12l | consider the problem of 
computing a statistic over a population where privacy costs may be correlated with private types. 
Their approach is fundamentally different from ours, however, because they crucially assume the 
surveyor has perfect knowledge of the correlation between types and costs. In the present work, we 
make no such Bayesian assumptions. 

2 Preliminaries 

The approach to modeling privacy that we use is the by-now-standard model of differential privacy. 

We think of databases as being an ordered multiset of elements from some universe X: D £ X* 
in which each element corresponds to the data of a different individual. We call two databases 
neighbors if they differ in the data of only a single individual. 

Definition 2.1. Two databases of size n D,D' X" are neighbors with respect to individual i if 
for ah j ^ie[n], Dj = D'-. 

We can now define differential privacy. Intuitively, differential privacy promises that the output 
of a mechanism does not depend too much on any single individual's data. 



Definition 2.2 f |DMNS06l |^. A randomized algorithm A which takes as input a database D 



G 

X* and outputs an element of some arbitrary range R is ej-differentially private with respect to 
individual i if for all databases D, D' G X* that are neighbors with respect to individual i, and for 
all subsets of the range S Q R, we have: 

Ft[A{D) £S]< e^p{ei)Pr[A{D') e S] 

A is ej-minimally differentially private with respect to individual i ii Si = inf (e > 0) such that A is 
e-differentially private with respect to individual i. When it is clear from context, we will simply 
write ej-differentially private to mean ej-minimally differentially private. 

A simple and useful fact is that post-processing does not affect differential privacy guarantees. 

Fact 2.1. Let A : X* R be a randomized algorithm which is ei-differentially private with respect 
to individual i, and let f : R ^ T be an arbitrary (possibly randomized) function mapping the range 
of A to some abstract range T. Then the composition g o f : X* T is Ei- differentially private 
with respect to individual i. 
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A useful distribution is the Laplace distribution. 

Definition 2.3 (The Laplace Distribution). The Laplace Distribution with mean and scale b 
is the distribution with probability density function: Lap(2;|6) = ^exp(— ^). We will sometimes 
write Lap(6) to denote the Laplace distribution with scale b, and will sometimes abuse notation 
and write Lap(6) simply to denote a random variable X ~ Lap(6). 

A fundamental result in data privacy is that perturbing low sensitivity queries with Laplace 
noise preserves e-differential privacy. 

Theorem 2.1 ((dmnso^). Suppose f : X* — )■ M is a function such that for all adjacent databases 
D and D' , \\f{D)— f{D')\\\ < 1. Then the procedure which on input D releases f{D) + (Xi, . . . , X).), 
where each Xi is an independent draw from a Lap{\/£) distribution, preserves e-differential privacy. 

We consider a (possibly infinite) collection of individuals drawn from some distribution over 
types T>. There exists a finite collection of private types T. Each individual is described by a 
private type tj € T, as well as a nondecreasing cost function q : M+ — )• M+ that measures her 
disutility Cj (e^ ) for having her private type used in a computation with a guarantee of Ej-differential 
privacy. 

Agents interact with the mechanism as follows. The mechanism will be endowed with the ability 
to select an individual i uniformly at random (without replacement) from V, by making a call to a 
population oracle Ox>- Once an individual i has been sampled, the mechanism can present i with a 
take-it- or-leave-it offer, which is a tuple {pi,ej,ef) G Mi]_. pi represents an offered payment, and ej 
and e? represent two privacy parameters. The agent then makes her participation decision, which 
consists of one of two actions: she can accept the offer, or she can reject the offer. If she accepts 
the offer, she communicates her (verifiable) private type tj to the auctioneer, who may use it in 
a computation which is ^^-differentially private with respect to agent i. In exchange she receives 
payment pi. If she rejects the offer, she need not communicate her type, and receives no payment. 
Moreover, the mechanism guarantees that the bit representing whether or not agent i accepts the 
offer is used only in an ej-differentially private way, regardless of her participation decision. 

2.1 How Cost Functions Relate to Types 

We model agents as caring only about the privacy of their private type ti, but they may also 
experience a cost when information about their cost function Ci{£i) is revealed — because of possible 
correlations between costs and types. To capture this phenomenon while still avoiding making 
Bayesian assumptions, we take the following approach. 

Implicitly, there is some (possibly randomized) process fi which maps a user's private type t to 
his cost function Cj = fi{t), but we make no assumption about the form of this map. This takes 
a worst case view - i.e., we have no prior over individuals' cost functions. For point of reference, 
in a Bayesian model, the function / would represent the user's marginal distribution over costs 
conditioned on its type. We make no Bayesian assumptions, but introduce this function / so as to 
formalize our model of utility for privacy, which is crucial to the results we give in this paper. 

When an individual i is faced with a take-it-or-leave-it offer, her type is used in two compu- 
tations: first, her participation decision (which may be a function of her type) is used in some 
computation Ai which will be -differentially private. Then, if she accepts the offer, she allows 
her type to be used in some computation A2 which may be ^^-differentially private. 
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We model individuals as caring about the privacy of their cost function only insofar as it reveals 
information about their private type. Because their cost function is determined as a function of 
their private type, if P is some predicate over cost functions, if P{ci) = P{fi{ti)) is used in a way 
that guarantees ej-differential privacy, then the agent experiences a privacy loss of some e[ < Ei 
(which corresponds to a disutility of some Ci{e[) < Cj(ej)). We write gi{ei) = e[ to denote this 
correspondence between a given privacy level and the effective privacy loss due to use of the cost 
function at that level of privacy. For example, if fi is a deterministic injective mapping, then fi{ti) 
is as disclosive as ti and so gi{ei) = ei. On the other hand, if fi produces a distribution independent 
of the user's type, then gi{ei) = for all e^. 

We note that the cost function model we describe here can also incorporate other costs of 
participating in a survey not related to privacy concerns, such as valuing time or disliking talking 
to strangers. One can fold in such considerations (which contribute constant factors independent 
of the privacy parameter) without changing our model or the qualitative nature of our results, but 
such a change might result in nonlinear cost functions, violating a simplifying assumption made in 
the analysis of our mechanism's cost. 

2.2 Cost Experienced from a Take-It-Or-Leave-It Mechanism 

Definition 2.4. A Private Take-It-Or-Leave-It Mechanism is composed of two algorithms, Ai and 
A2. Ai makes offer {pi,ej,ef) to individual i and receives a binary participation decision. If player 
i participates, she receives a payment of pi in exchange for her private type ti. A\ performs no 
computation on ti. The privacy parameter e\ for A\ is computed by viewing the input to A\ to be 
the vector of participation decisions, and the output to be the number of individuals to whom offers 
were made, the offers (pj, ej, e?), and an e]^-differentially private count of the number of players who 
chose to participate at the highest price we offer. 

Following the termination oi A\, a separate algorithm A2 computes on the reported private 
types of these participating individuals and outputs a real number ^. The privacy parameter of 
A2 is computed by viewing the input to be the private types of the participating agents, and the 
output as s. 

We assume that agents have quasilinear utility (cost) functions: given a payment pi, an agent 
i who declines a take-it-or-leave-it offer (and thus receives no payment) and whose participation 
decision is used in an e^^-differentially private way experiences utility Ui = —Ci{gi{ej)) > —Ci{ej). 
An agent who accepts a take-it-or-leave-it offer and receives payment p, whose participation decision 
is used in an e]^-differentially private way, and whose private type is subsequently used in an ef- 
differentially private way experiences utility Ui = pi — Ci (e^ -|- gi [ej ) > Pi — Ci {ef + ) . 

Remark 2.1. While this model of costs, including the function gi, might seem complex, note that 
it captures the correct cost model in a number of situations. Suppose, for example, that costs have 
correlation 1 with types, and Ci{£) = 00 if and only if ti = 1, otherwise Cj(e) <C pi. Then, asking 
whether an agent wishes to accept an offer [pi,ei,Ei) is equivalent to asking whether ti = 1 or 
not, and those accepting the offer are in effect answering this question twice. In this case, we have 
gi{£) = e. On the other hand, if types and costs are completely uncorrelated, then there is no privacy 
loss associated with responding to a take-it-or-leave-it offer. This is captured by setting gi{£) = 0. 

Note that by accepting an offer, agent i achieves utility at least 

Pi - aiej + giiej)) >Pi- Ci{ej + e}). 
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By rejecting an offer, agent i might achieve negative utility, bounded by: 



> -c,{9i{el)) > -c,{ej). 
Agents wish to maximize their utility, and so the following lemma is immediate: 

Lemma 2.2. A utility-maximizing agent i will accept a take-it- or-leave-it offer {pi,ej,ef) when 
Pi > Ci{e}+ef) 

Proof. We simply compare the lower bound on an agent's utility when accepting an offer with an 
upper bound on an agent's utility when rejecting an offer to find that agent i will always accept 
when 

Pi - Ci(4 + ei)> 0. 

□ 

Remark 2.2. Note that this lemma is tight exactly when agent types are uncorrelated with agent 
costs - i.e., when gi{e) = 0. When agent types are highly correlated with costs, then rejecting an 
offer becomes more costly, and agents may accept take-it- or-leave-it offers at lower prices. 

We make no claims about how agents respond to offers ij)i,e\,£j) for which pi < Ci{e\ + ej). 
Note that since agents can suffer negative utility even by rejecting offers, it is possible that they 
will accept offers that lead to experiencing negative utility. Thus, in our setting, take-it-or-leave-it 
offers are not necessarily truthful in the standard sense. Nevertheless, Lemma 12.21 will provide a 
strong enough guarantee for us of one-sided truthfulness: we can guarantee that rational agents 
will accept all offers that guarantee them non-negative utility. 

Note that our mechanisms will satisfy only a relaxed notion of individual rationality: we have 
not endowed agents with the ability to avoid having been given a take-it-or-leave it offer, even if 
both options (taking or rejecting) would leave her with negative utility. Agents who reject take-it- 
or-leave-it offers can experience negative utility in our mechanism because their rejection decision 
is observed and used in a (differentially private) computation. Once the take-it-or-leave-it offer 
has been presented, agents are free to behave selfishly. We feel that both of these relaxations 
(of truthfulness and individual rationality) are well motivated by real world mechanisms in which 
surveyors may approa ch indiv iduals in public, and crucially, they are necessary in overcoming the 



impossibility result in [GRll[. 



Most of our analysis holds for arbitrary cost functions Cj , but we do a benchmark cost comparison 
assuming linear utility functions of the form Cj(e) = vie, for some quantity Vi. 



2.3 Accuracy 

Our mechanism is designed to be used by a data analyst who wishes to compute some statistic 
about the private type distribution of the population. Specifically, the analyst gives the mechanism 
some function Q : T — > [0,1], and wishes to compute a = Et.ooX)[(5(ii)], the average value that Q 
takes among the population of agents V. The analyst wishes to obtain an accurate answer, defined 
as follows: 

Definition 2.5. A randomized algorithm, given as input access to a population oracle Ox> which 
outputs an estimate M{Ox>) = a of a statistic a = ^ti'^v[Q{ti)] is a-accurate if: 

Pr[|a - a| > a] < 
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where the probabihty is taken over the internal randomness of the algorithm and the randomness 
of the population oracle. 

The constant | is arbitrary, and is fixed only for convenience. It can be replaced with any other 
constant value without qualitatively affecting any of the results in this paper. 

2.4 Cost 

We will evaluate the cost incurred by our mechanism using a bi-criteria benchmark: For a parametriza- 
tion of our mechanism which gives accuracy a, we will compare our mechanism's cost to a bench- 
mark algorithm that has perfect knowledge of each individual's cost function, but is constrained 
to make every individual the same take-it-or-leave-it off'er (the same fixed price is offered to each 
person in exchange for some fixed e'-differentially private computation on her private type) while 
obtaining a/32 accuracy!! That is, the benchmark mechanism must be "envy-free", and may ob- 
tain better accuracy than we do, but only by a constant factor. On the other hand, the benchmark 
mechanism has several advantages: it has full knowledge of each player's cost, and need not be 
concerned about sample error. For simplicity, we will state our benchmark results in terms of 
individuals with linear cost functions. 

3 Mechanism and Analysis 

3.1 The Take-It-Or-Leave-It Mechanism 

In this section we describe our mechanism, which we present in Figure [H It is not a direct revelation 
mechanism, and instead is based on the ability to present take-it-or-leave-it offers to uniformly 
randomly selected individuals from some population. This is intended to model the scenario in 
which a surveyor is able to stand in a public location and ask questions or present offers to passers 
by (who are assumed to arrive randomly). Those passing the surveyor have the freedom to accept 
or reject the offer that they are presented, but they cannot avoid having the question posed to 
them. 

Our mechanism consists of two algorithms. Algorithm Ai is run on samples from the population 
with privacy guarantee Eq, until it terminates at some final epoch j; and then algorithm A2 is run 
on (AcceptedSetj, EpochSize(j), Eq)- 

3.2 Privacy 

Note that our mechanism offers the same Eq in every take-it-or-leave-it offer it makes. 

Theorem 3.1. The Harassment Mechanism is eq- differentially private with respect to the partici- 
pation decision of each individual approached. 

Proof. The observable output of Ai is the total number of people approached, the payments offered, 
and the noisy count of the number of number of players who accepted the offer in the final epoch. 
The first two of these are functions only of the choice of the final epoch j at which the algorithm 

^Note that we have made no attempt to optimize the constant. 
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Algorithm 1 Algorithm Ai, the "Harassment Mechanism". It is parametrized by an accuracy 
level a, and we view its input to be the participation decision of each individual approached with a 
take-it-or-leave-it offer, and its observable output to be the number of individuals approached, the 
payments offered, and the noisy count of the number of players who accepted the offer in the final 
epoch. 

Let EpochSize(j) ^ mi^^i+^. 
Let j ^ 1. 
Let eo = a 
while TRUE do 

Let AcceptedSetj ^ and NumberAccepted^ ^ and Epoch^ <— 
for i = 1 to EpochSize(j) do 

Sample a new individual Xi from V. 
Let Epochj ^ Epoch^- U {xi}. 

Offer Xi the take-it-or-leave it offer (pj, eo,eo) with pj = (1 + Tjy 
if i accepts then 

Let AcceptedSetj ^ AcceptedSetj U {xj} and 

Number Accepted^- ^ Number Acceptedj + 1. 
Let Uj ~ Lap(l/eo) and NoisyCount^ = Number Accepted^ + 
if NoisyCountj > (1 — a/8)EpochSize(j) then 

Call Estimate(AcceptedSetj, EpochSize(j), eo). 
else 

Let + 1 



Algorithm 2 The Estimation Mechanism. We view its inputs to be the private types of each 
participating individual from the final epoch, and its output is a single numeric estimate. 
Estimate(AcceptedSet, EpochSize, e): 

Let a = Ex.eAcccptedSet Qi^i) + Lap(l/e) 
Output a/EpochSize. 
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halts. But this decision is made as a function only of the quantity Noisy County, which preserves 
eo-differential privacy by the properties of the Laplace mechanism, and the fact that the vector 



(Number Accepted^, . . . , Number Accepted^) 



has sensitivity 1. 



□ 



Theorem 3.2. The Estimation Mechanism is Eq- differentially private with respect to the partici- 
pation decision and private type of each individual approached. 



Note that these two theorems, together with Lemma 12.21 imply that each agent will accept its 
take-it-or-leave-it offer of {pj, 6^,80) whenever pj > Cj(2eo)- 

3.3 Accuracy 

Theorem 3.3. Our overall mechanism, which first runs the Harassment Mechanism and then hands 
the types of the accepting players from the final epoch to the Estimation Mechanism, is a-accurate. 

Proof. We need simply control four sources of error, which we do in turn. Suppose that the 
algorithm halts and outputs an answer computed from epoch j. 

We first consider the effects of sample error, namely, the difference between the statistic among 
those individuals approached in an epoch (not just those who accepted our offer) and the true value 
of the statistic in the underlying population. 

Lemma 3.4. Except with probability at most 1/12, for each epoch j < j we have: 



Proof. This follows from a Chernoff bound and a union bound. Because the individuals in each 
epoch are sampled i.i.d. from D, by the additive version of the Chernoff bound, we have for each 
epoch j: 



Proof. The theorem follows directly from the privacy of the Laplace mechanism. 



□ 





< 2 exp 



- • EpochSize(j) • 
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2 exp 
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By a union bound, we now have: 



Pr 



max 

j 



1 



EpochSize(j) 



XiGEpoch 



> 



a 



< 



E 



oo 

s E 

1 



2 



100/8 



a + 1) 



100/8 



< 



12 



□ 



We next consider the impact of the noise added for the purpose of maintaining differential 
privacy in the Harassment Mechanism. 

Lemma 3.5. Except with probability at most 1/12, for each epoch j < j we have: 

01 

\vj\ < — ■ EpochSize{j) 
8 

Proof. By the properties of the Laplace distribution, if random variable Y ^ Lap(6), then: Pr[|y| > 
t -b] = exp(— t). By a union bound, we have: 



a 



max \uj\ > — ■ EpochSize(j) 



< > Pr > - ■ EpochSize(j) 

oo 



2^exp 1^ — ^log(j + l)j 

oo . . 100/8 



1 

< 12 



□ 



Note that there is an additional loss of accuracy due to the fact that we target only a (1 — q/8) 
participation level in each epoch. 

We must also consider the impact of the differential privacy guarantee we give on the statistic 
output by the Estimation Mechanism. 

Lemma 3.6. Except with probability at most 1/12, we have: 



E 

XiSAcceptedSetj 



Q{Xi 



a 



< —EpochSize{j) 



12 



Proof. By the properties of the Laplace distribution, we have: 



Pr 



^ Q{xi 



S AcccptcdSet ■ 



a. 



> — EpochSize(j) 



exp 



-— log(j + 1, 

100/4 



i + 1 



< 



12 



□ 



We can now finish the proof. Except with probabiUty 3 • = |, the conclusions of all of the 
previous 3 lemmas hold. We therefore have by the triangle inequality: 



EpochSize(j) ^ 



E[Q(x)] 



< 



GAcceptedSetj 



Q{Xi 



+ 



+ 



EpochSize(j) EpochSize(j) 

Zlx, G AcceptedSet . Qi^i) Yl XiGEpoch ■ 



EpochSize(j) 



EpochSize(j) 



EpochSize(j) 



^ Q{x,)-E[Q{x)] 



XiSEpoch 



a a a 
<-+-+- 
4 4 4 

<a, 

which completes the proof. 

Note that we have made no attempt to optimize the constants in this section. 



□ 



3.4 Benchmark Comparison 

In this section we compare the cost of our mechanism to the cost of an omniscient mechanism that 
is constrained to make envy- free offers and achieve 0(a)-accuracy. For the purposes of the cost 
comparison, in this section we assume that the individuals our algorithm approaches have linear 
cost functions: Cj(e) = ViE for some Vi G M"*". 

We will use a result of Ghosh and Roth, translated into our setting. 

Theorem 3.7 ( |GR11I |). Let < a < 1. Given any finite sample of size n, any differentially 
private mechanism that is a/ 4: accurate must select a set of agents H C [n] such that: 

1- £i>^ for allien 

2. \H\ > (1 - a)n 

Let v{a) be the smallest value v such that frx^^T>[vi < f] > 1 — a. In other words, {v{a) -26, e, e) 
is the cheapest take-it-or-leave-it offer for e-units of privacy that in the underlying population 
distribution would be accepted with probability at least 1 — a. It follows immediately that: 
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Observation 3.1. Any (a/ 32) -accurate mechanism that makes the same take-it- or-leave-it offer 

v( — ) 

to every individual Xj ~ P must in expectation pay in total at least Note that because here 

we assume cost functions are linear, this quantity is fixed independent of the number of agents the 
mechanism draws from T>. 

Proof. Note that if the benchmark algorithm were to incur sample error, this would only strengthen 
our lower bound. 

By Theorem 13 .7^ any a/32-accurate algorithm must offer a high enough price to get a partic- 
ipation rate of at least (1 — a/8) at a privacy level of at least e = 32 /{an). The cost for such 
a participation rate is 32/(a • n) ■ v{a/8) by the definition of v. Suppose the mechanism sam- 
ples n individuals. Then the mechanism must in expectation pay v{a/8) ■ 32/(an) to n(l — a/8) 
individuals. □ 

We now bound the expected cost of our mechanism, and compare it to our benchmark cost, 
BenchMarkCost = G(^) 

Theorem 3.8. The total expected cost of our mechanism is at most: 



Remark 3.1. Note that the additive Xjo? term is necessary only in the case in which v(a/8) < 
(1 + r])la: i.e., only in the case in which the very first offer will be accepted by a 1 — a/8 fraction 
of players with high probability. In this case, we have started off offering too much money, right off 
the bat. An additive term is necessary, intuitively, because we cannot compete with the benchmark 
cost in the case in which the benchmark cost is arbitrarily small^ 

Proof. Let j* be the minimum epoch number such that the price offered is at least v{a/8)£Q = 
av{a/8): pj* = (1 + rjy > v{a/8) ■ a. This is the first round at which the price offered is high 
enough to guarantee an expected rate of participation above (1 — a/8). Note that if j* > 1 we also 
have pj* < (1 -|- r])v{a/8) ■ a. Our proof will proceed by arguing that E[MechanismCost] is within 
a small constant factor of the cost incurred during epoch j* . 

We first argue that the total cost incurred during all epochs j < j* is comparable to the 
cost incurred at epoch j*, which is at most: (1 + rjy ■ EpochSize(j*). We will write Cost(z) = 
Pi ■ |AcceptedSetj| to denote the total cost of epoch i. Therefore by the definition of j* we have in 
the case in which j* > 1: 





Q l^ v{a/8)logj* \^ 
Q ^ v{a/8)loglogia-via/8)) ^ 
= Q (log log(a • v{a/8)) ■ BenchMarkCost) 



^We thank Lisa Fleischer and Yu-Han Lyu for pointing out the need for the additive term. 
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In the case in which j* = 1, we have Cost(j*) = (1 + r/) • EpochSize(l) = 0(l/a^). Thus, in both 
cases we have: 

Cost(j*) < O ( loglogfa • v(a/8)) ■ BenchMarkCost + ^ 

Therefore, the theorem will be proven if we can argue that E[MechanismCost] = 0(Cost(j*)). 
The remainder of the proof will establish this claim. 

Theorem 3.9. E[MechanismCost] = 0{Cost{j*)) whenever rj is a constant such that ci < rj < 
— C2 where ci and C2 are constants bounded away from 0. 

Proof. We first argue that the contribution to the cost of epochs j < j* is small. 
Lemma 3.10. 



Proof. 



Costii) < ^^^"^^^ . EpochSizeif 

i=l ^ 



^ Cost(i) = ^(l + 77)*|AcceptedSetJ 
1=1 1=1 

< EpochSize(j*) ^{1 + vT 

1=1 

< -EpochSizeO") 



□ 



We next argue that the contribution to the expected cost of the algorithm of epochs j > j* is 
small. 

Lemma 3.11. For any epoch j > j* , the probability that the algorithm reaches epoch t before 
halting is at most (17/20)*"-' . 

Proof. First recall that by definition of j*, we have for any j > j* 

Pr [xi accepts {pj, £o,£o)] > 1 — a/8 

We therefore have: 

Pr[|AcceptedSet, | < (1 - a/8) • EpochSize(i) - 1] < - 

2 

Note that round j will be the final round if EpochSize(j) + Vj > (1 — a/8)EpochSize(j). 

Conditioned on the event |AcceptedSetj | > (l — a/8)-EpochSize(j) — l, we have by the properties 
of the Laplace distribution: 

1 3 

Pr[EpochSize(j) + uj > (1 - a/8)EpochSize(j)] > - • exp(-eo) > 

Therefore, each round j > j* is the final round with probability at least 3/20. Since each of these 
events is independent, the probability that the algorithm does not halt before epoch t is at most 
(17/20)*-^* as desired. □ 



15 



Write Hj for the event that the mechanism does not halt before round j. The expected cost of 
the mechanism is then at most: 

j* oo 

Cost(i)+ XI + ■ EpochSize(j) • rr[Hj] 
i=i j=j*+i 



< 



(i+r?y* 



V 



EpochSize(j*) + X] + " EpochSize(j) • Pr[i7j] 



oo 



< ^^^"^^^ • EpochSizefj*) + V (1 + r]y ■ EpochSize(i) • (l7/20y-^* 



= O ^ " ■ EpochSizefj* 
V ^ 

whenever there exist constants ci , C2 bounded away from such that ci < rj < — C2- D 
Therefore, we have shown that 

EfMechanismCost] = 0(Cost(j*)) = O ( loglog(Q • v{a/S)) ■ BenchMarkCost + 4t 

which completes the proof. □ 

4 Discussion 

In this paper, we have proposed a method for accurately estimating a statistic from a population 
that experiences cost as a function of their privacy loss. The statistics we consider here take the form 
of the expectation of some predicate over the population. We leave to future wor k the co nsideration 



of other, nonlinear, statistics. We have circumvented the impossibility result of [GRlll | by using a 
mechanism empowered with the ability to approach individuals and make them take-it-or-leave-it 
offers (instead of relying on a direct revelation mechanism) , and by relaxing the measure by which 
individuals experience privacy loss. Moving away from direct revelation mechanisms seems to us 
to be inevitable: if costs for privacy can be correlated with private data, then merely asking for 
individuals to report their costs is inevitably disclosive, for any reasonable measure of privacy. On 
the other hand, we do not claim that the model we use for how individuals experience cost as a 
function of privacy is "the" right one. Nevertheless, we have argued that some relaxation away from 
individuals experiencing privacy cost entirely as a function of the differential privacy parameter of 
the entire mechanism is inevitable (as made particularly clear in the setting of take-it-or-leave-it 
offers, in which individuals in this model would accept arbitrarily low offers). In particular, we 
believe that the style of survey mechanism presented in this paper, in which the mechanism may 
approach individuals with take-it-or-leave-it offers, is realistic, and any reasonable model for how 
individuals value their privacy should predict reasonable behavior in the face of such a mechanism. 
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